Part-of-Speech Tagging of Portuguese Based on Variable Length Markov Chains

نویسندگان

  • Fábio Natanael Kepler
  • Marcelo Finger
چکیده

Abstra t. Tagging is the task of attributing to words in ontext in a text, their orresponding Part-of-Spee h (PoS) lass. In this work, we have employed Variable Length Markov Chains (VLMC) for tagging, in the hope of apturing long distan e dependen ies. We obtained one of the best PoS tagging of Portuguese, with a pre ision of 95.51%. More surprisingly, we did that with a total time of training and exe ution of less than 3 minutes for a orpus of almost 1 million words. However, long distan e dependen ies are not well aptured by the VLMC tagger, and we investigate the reasons and limitations of the use of VLMCs. Future resear hes in statisti al linguisti s regarding long range dependen ies should on entrate in other ways of solving this limitation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Part - of - Speech Tagging Usinga Variable Memory Markov

We present a new approach to disambiguating syntactically ambiguous words in context, based on Variable Memory Markov (VMM) models. In contrast to xed-length Markov models, which predict based on xed-length histories, variable memory Markov models dynamically adapt their history length based on the training data, and hence may use fewer parameters. In a test of a VMM based tagger on the Brown c...

متن کامل

Part-of-Speech Tagging using a Variable Memory Markov Model

We present a new approach to disambiguating syntactically ambiguous words in context, based on Variable Memory Markov (VMM) models. In contrast to fixed-length Markov models, which predict based on fixed-length histories, variable memory Markov models dynamically adapt their history length based on the training data, and hence may use fewer parameters. In a test of a VMM based tagger on the Bro...

متن کامل

Part - of - Speech Tagging Usinga Variable Memory

We present a new approach to disambiguating syntactically ambiguous words in context, based on Variable Memory Markov (VMM) models. In contrast to xed-length Markov models, which predict based on xed-length histories, variable memory Markov models dynamically adapt their history length based on the training data, and hence may use fewer parameters. In a test of a VMM based tagger on the Brown c...

متن کامل

برچسب‌گذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی

Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical ...

متن کامل

Variable-Length Markov Models and Ambiguous Words in Portuguese

Variable-Length Markov Chains (VLMCs) offer a way of modeling contexts longer than trigrams without suffering from data sparsity and state space complexity. However, in Historical Portuguese, two words show a high degree of ambiguity: que and a. The number of errors tagging these words corresponds to a quarter of the total errors made by a VLMCbased tagger. Moreover, these words seem to show tw...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006